4. Experimentation: the Effect of Exploration on the Agent's Performance

نویسنده

  • R. E. Schapire
چکیده

Model-based learning of interaction strategies in repeated-games has received a lot of attention in the game-theory literature. Gilboa & Samet [10] deal with bounded regular players. They describe a model-based learning strategy for repeated games that learns the best response against any regular strategy. Their procedure enumerates the set of all automata and chooses the current opponent model to be the first automaton in the sequence that is consistent with the current history. Exploration is achieved by designing a sequence of actions that distinguishes between the current model and the next consistent automaton in the enumera-tion. The risk involved in exploration is bypassed by assuming that the opponents' strategies are limited to strongly connected automata, where there are no " sinks " and there is always opportunities to regret. For such automata, the learning algorithm is guaranteed to converge to the best response in the limit. This learning procedure is based on exhaustive search in the space of automata, and therefore, is impractical for computational bounded agents. The main role of the opponent model is to predict its behavior in the future. Choosing a proper class of strategies for modeling is essential for the success of the model-based strategy. If the model class is too restricted it will probably fail in prediction. On the other hand, a too general class will make the best response problem and the learning problem intractable. Often, there are many ways to model a given behavior. This paper concentrates on deterministic finite automata for modeling the agents' strategies. The question how the model-based framework can be extended for more powerful agents remains open for future research. The complexity of computing a best response automaton in repeated games with mixed strategies. Efficient algorithms for learning to play repeated games against computationally bounded adversaries. actions to the given history, and by applying the learning algorithm to the expanded histories, we acquire models that are consistent with the history and predict differently the opponent responses for the player sequences of actions. To summarize, for exploring the opponent's strategy using a mixed model, the agent first searches d stages forward for collecting different opponent models to the set of support. It then infers a belief distribution over this set. Following that, it finds the-best response against the mixed model and performs a sequence of actions dictated by this strategy. By doing so, the agent rationally balances between exploration and …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of Electronic Portfolio Assessment on the Writing Performance of Iranian EFL Learners

The present study attempted to investigate the impact of electronic portfolio assessment on performance of Iranian EFL learners’ writing. To do so, 30 advanced EFL learners who participated in a TOEFL preparation course were selected as the participants of the study. After administrating a truncated version of TOEFL proficiency test, they were randomly assigned to control and experimental...

متن کامل

Experimental and Simulation - Assisted Feasibility Study of Gas Injection to Increase Oil Recovery Using a Combination of Semi-VAPEX and GAGD Techniques

Gas injection into heavy oil reservoirs could result in high ultimate recovery of oil. Experimental studies showed that an application of a combined technology of Gas Assisted Gravity Drainage (GAGD) and Vapor Extraction (VAPEX) could increase final oil recovery of a candidate viscous oil reservoir. In this paper the results of laboratory investigation are presented, including Pressure-Volu...

متن کامل

Typical Ka band Satellite Beacon Receiver Design for Propagation Experimentation

This paper presents the design and simulation of a typical Ka band satellite beacon receiver for propagation experimentation. Using satellite beacon signal as a reference signal in satellite wave propagation study, is one of the most important methods. Satellite beacons are frequently available for pointing large antennas, but such signals can be used for measuring the effect of natural phenome...

متن کامل

Separating Well Log Data to Train Support Vector Machines for Lithology Prediction in a Heterogeneous Carbonate Reservoir

The prediction of lithology is necessary in all areas of petroleum engineering. This means that to design a project in any branch of petroleum engineering, the lithology must be well known. Support vector machines (SVM’s) use an analytical approach to classification based on statistical learning theory, the principles of structural risk minimization, and empirical risk minimization. In this res...

متن کامل

Investigation of Wear Behavior of Biopolymers for Total Knee Replacements Through Invitro Experimentation

The average life span of knee prosthesis used in Total Knee Replacement (TKR) is approximately 10 to 15 years. Literature indicates that the reasons for implant failures include wear, infection, instability, and stiffness. However, the majority of failures are due to wear and tear of the prosthesis. The most common biopolymer used in TKR  is Ultra High Molecular Weight Polyethylene (UHMWPE). Pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998